Cython

Introduction

Cython is an optimizing static compiler for Python that gives you the combined power of Python and C.

Normally the use of C-code sacrifices two main reasons for using Python, namely the high-level language and cross-platform code. In this context, however, cython translates normal Python code into C-code which is compiled on the target machine and imported as a normal library.

The process requires the cython package, a setup script and a compiler.

The cython package is accessible from the homepage http://cython.org/, from the Python Package Index, winpython, anaconda etc.
The setup script can be generated automatically by the 'cython_import.py' script (found in the folder of this notebook)
The compiler (from http://docs.cython.org/src/quickstart/install.html):
- Linux: The GNU C Compiler (gcc) is usually present, or easily available through the package system. On Ubuntu or Debian, for instance, the command sudo apt-get install build-essential will fetch everything you need.
- Mac OS X: To retrieve gcc, one option is to install Apple’s XCode, which can be retrieved from the Mac OS X’s install DVDs or from http://developer.apple.com.
- Windows: A popular option is to use the open source MinGW (a Windows distribution of gcc). See the appendix for instructions for setting up MinGW manually. EPD and Python(x,y) bundle MinGW, but some of the configuration steps in the appendix might still be necessary. Another option is to use Microsoft’s Visual C. One must then use the same version which the installed Python was compiled with.

A prime function

Run the following function by pressing Shift+Enter in the cell. The prime-function returns the number of primes in the range [2..n] and the print_time-decorator prints the execution time (see Notebook on decorators)



In [ ]:

    
def primes(n):
    count = 0
    for i in xrange(2,n):
        for j in xrange(2,i):
            if i%j==0: 
                break
        else:
            count+=1
    return count
%timeit primes(10000)

Compile function

Here is a copy of the script in cy_primes.py:

def primes(n):
    count = 0
    for i in xrange(2,n):
        for j in xrange(2,i):
            if i%j==0: 
                break
        else:
            count+=1
    return count

Yes, it is exactly similar to the function above, but by running the next script it will be compiled to a C-library cy_primes.pyd(Windows)/cy_primes.so(Unix), which is imported.

Finally the compiled version of primes is in executed from primes_cython in order to see the execution time



In [ ]:

    
from cython_import import cython_import

# Translate, compile and import cy_primes.py
cython_import('cy_primes')
import cy_primes #import after compilation


# Test compiled function
% timeit cy_primes.primes(10000)

Yeah!, writing C-code is rather easy!!!

The line

`cython_import('cy_primes')´

does the magic. It:

Generates a cy_primes.pyx file from the code from cy_primes.py
Generates a setup.py file containing a build script
Executes the setup script which makes cython compile cy_primes.pyx into cy_primes.pyd/cy_primes.so
Deletes all temporary files

But the speedup is not impressing... yet!

Static typing

One of the major differences between C and Python is that:

Python is dynamic type
- There is no need for variable declaration (Python determines the type based on the contents)
- The type of a variable can be changed during execution
C is static type
- All variables must be declared to specify the type
- Variables cannot change type during execution

Dynamic typing makes it easy to code, but slow to execute, so in order to speed up the compiled function, variables must be declared.

Variable declation can be done in two ways: Pure and cdef

Pure

Pure provides an easy and pythonic way to apply static typing. The documentation can be found at: http://docs.cython.org/src/tutorial/pure.html

One advantage of using pure is that types are declared via python decorators. This means that if the module for some reason cannot be compiled, then it can be used as a normal python module.

In cy_primes_pure.py four lines are appended above the function


    import cython
    @cython.ccall
    @cython.locals(n=cython.int, count=cython.int, i=cython.int, j=cython.int)
    @cython.returns(cython.int)
    def primes(n):
        ...

@cython.ccal declares the function as a fast C-function with a Python wrapper(i.e. it is fast and callable from Python)
Tip: If the Python wrapper is not required, @cython.cfunc, can be used (The function will be a little faster, but cannot be invoked from Python modules)
@cython.locals(... declares the type of local variables and input arguments
- Signed integers: char, short, int, long, longlong
- Unsigned integers: uchar, ushort, uint, ulong, ulonglong
- floats: float, double, longdouble
- Boolean: bint
- Python types: list, tuple, dict
- Pointers: p_int, pp_int,... (see http://docs.cython.org/src/tutorial/pure.html)
@cython.returns(... declares the type of the returned value

Now, the speedup should be significant



In [ ]:

    
from cython_import import cython_import

# Translate, compile and import cy_primes_pure.py
cython_import("cy_primes_pure")
import cy_primes_pure #import after compilation


# Test compiled function
%timeit cy_primes_pure.primes(10000)

Exercise 1

Optimize the integrate functions, integrate and f, below using Pure (Note that scipy contains a whole library for integration). It is possible to speed up the execution speed about 100 times (You may need to increase N as the execution time is printed in ms)

In the bottom of the script, the test_func([func1, func2,...],(arg1,arg2,...))-call, performs a test, in which a cython compiled version of the two functions are compare to the python version.

Note: The Kernel must be restarted before every launch. Otherwise the compiled libray cannot be replaced.



In [ ]:

    
import cython 

# write your pure declarations here
def f(x):
    return x**2-x

# write your pure declarations here
def integrate(a, b, N): 
    s = 0
    dx = (b-a)/N
    for i in xrange(N):
        s += f(a+i*dx)
    return s * dx

# Compile, import and compare compiled version to python version
from test_func import test_func
a,b,N = 0,10000,1000000 #testing arguments
test_func([integrate, f],(a,b,N))

cdef

In fact, cython translates Pyrex code and not Python code into C-code (as described in the beginning of this notebook). Pyrex, with the file extension .pyx, is an extension of Python that adds function and variable type declarations.

In Pyrex types are declared using the cdef syntax: C-functions with a Python wrapper are declared using cpdef while C-functions without a Python wrapper and local variables are declared via cdef, e.g:

cpdef int primes(int n):
    cdef int count, i, j
    ...

Unfortunately, Python cannot not import Pyrex code, as it is not legal Python syntax, but a work around has been implemented in cython_import:

When cython_import generates the .pyx file from the python code, it searches all lines for "#c" and replaces everything before "#" with the text after the "#".
This means that cython_import generates the Pyrex code above from the following code legal python code.

def primes(n): #cpdef int primes(int n):
    #cdef int count, i, j
    ...

Take a look at cy_primes_cdef.py where this declaration is implemented and run the following script. The execution time should be similar to the pure-approach



In [ ]:

    
from cython_import import cython_import

# Translate, compile and import cy_primes_cdef.py
cython_import("cy_primes_cdef")
import cy_primes_cdef #import after compilation

# Test compiled function
%timeit cy_primes_cdef.primes(10000)

The cdef approach has an advantage over the pure-approach, as it supports declaration of numpy arrays, e.g:

#cimport numpy as np
def foo(A): #cpdef foo(np.ndarray[np.int_t,ndim=2] A):
    ...

The first parameter in the square brackets should be a numpy type followed by "_t". The suffix indicates that it is a compile time type.
The second parameter specifies the number of dimensions of the array. (default is 1).

Exercise 2

Speed up the convolution function below using cdef declarations. In this case it is possible to speed up the function more than 500 times (You may need to increase N as print_time prints in ms)
Again you must restart the Kernel before every launch.



In [ ]:

    
import cython
import numpy as np

def convolve(A,f):    
    f_size = f.shape[0]
    f_half_size = f_size/2
    B = np.zeros_like(A, dtype=np.double)
    for i in xrange(f_half_size,A.shape[0]-f_half_size):
        s = 0.
        for j in xrange(i-f_half_size,i+f_half_size+1):
            s += A[j]
        B[i] = s/f_size

    return B

# Compile, import and compare compiled version to python version
from test_func import test_func
N = 1000000
A = np.random.randint(0,100,N) # 1D Numpy array of N random integers in range [0,100]
f = np.array([1,1,1])          # 1D Numpy array of integers
test_func([convolve],(A,f))



In [ ]: